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adaptive testing to assess spatial ability. Data was collected from 
high school students on two types of spatial items: three-dimensional 
cubes and hidden figure items. The analysis of the three-dimensional 
cubes focused on the fit of the simplest possible item response model 
capable of modeling response Uase; the analysis of the hidden figure 
item focused on the feasibility of generating items from an algorithm 
in such a way that the psychometric characteristics of the generated 
items were predictable. The results for the three-dimensional cube 
items suggested that angular disparity can be used effectively to 
control the difficulty of true items, but this was not the case for 
false items. That is, true and false items appear to measure 
different aspects of performance, and as a result, a multidimensional 
item response model may be necessary to fully account for 
performance. The analysis of the hidden figure item 3 showed that an 
item generation algorithm can be formulated to produce items of 
similar psychometric characteristics. The practical and theoretical 
implication of the results are discussed. (Author/JAZ ) 
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Abstract 



This report summarizes the results of an 18-month contract 
entitled Adaptive Assessment of Spatial Ability. The project was 
focused on the psychometric and technological feasibility of adaptive 
testing systems of a procedural as opposed to declarative nature. 
That_is, adaptive-testing systems where items are generated as needed 
rather than explicitly retrieved from a database. To investigate the 
feasibility of suc± a^ testing data was collected 

from hi gh school students on two types of spatial items, three- 
dimensional cubes and hidden figure- items^- The analysis of the three- 
dimensional cubes focused oh the fit-of-the simplest possible item 
response model capable of modeling response -tiire;-^e analysis of the 
hidden figure item focused oh the feasibility of generating- item from 
an algorithm in such a way that the psychometric characteristics of 
the generated items were predictable. The results for the threes- - 
dimensional cube items suggested that angular disparity can be used 
effectively to control the difficulty of true items but this was not 
the case for false items. That is, true and false items appear to 
measure different aspects of performance and as a result a nailtidi- 
mensional-item response iwxlel may be necessary to fully account for 
performance on = even fairly simple spatial items such as three- 
dimensional cubes. The analysis of the hidden figure items showed 
that ah item generation algorithm can-be formulated to produce items 
of similar psychometric characteristics. The practical and theoret- 
ical implication of the results are discussed. 
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Final Report: Adaptive Assessment of Spatial Abilities 

Isaac I. Be jar 



As the title of this project suggests, -tbe = aim of - thi s research- is = 
tb study the feasibility and requirements of adaptive-testing-for-spatial 
ability. However , although the content of the research has been spatial 
abilities, the goal is in fact broader, namely to develop a methodology 
for what might be called second-generation adaptive testing that will be 
applicable hot only to spatial but to other abilities as well. 

First-generation adaptive testing methodology is well known and can 
be summarized as follows: Given a pool of items calibrated on a common 
scale, choose the set of items that is maximally informative for a given 
examinee. This methodology has now reached the point where it is a 
marketable product, and while there may still exist a need to do research 
on refinements of the methodology, the basic structure of the paradigm is 
well set. 



A characteristic of first^generation adaptive testing is its declar- 
ative nature. That is, each item in the pool must be stored explicitly 
in a database along with its psychometric parameters with respect to some 
item response model. A natural elaboration of this: approach was investi- 
gated in this project. That is, instead of bur explicitly ehume rating 
all the items, we investigated the idea of constructing: algorithms that 
generate the items with control of their psychpftHBtric characteristics. 
Rather than calibrating specific items, we calibrated the procedures that 
generate the items. In short, the elaboration moves from a declarative 
approach to a procedural one. 



- Cieariy^ procedural adaptive-testing involves more than psycho- 
metrics, since the encoding-of items into procedures- requires very 
specific knowledge about the determinants of item performance. It is 
precisely this requirement that offers some hope of improving the 
validational status of scores from an adaptive testing procedure. The 
air rent approach to adaptive testing improvement in validity is limited 
to the inprovement accruing from more precise measurement. There is hope 
that the next generation in adaptive testing will improve the valida- 
tional status of test-score interpretations by continually submitting to 
testing the theory of item performance embedded in the i tern-gene ration 
algorithm. As a result of that continual challenge, the theory will 
either be ; confirmed or ^revised, and it is very likely that in that 
prqcess-we-wili learn much about the psychological underpinnings of 
perfoi^nance on the test. 



The calibration of a procedure consists of item linking those ____ 
determinants of performance to a psychometric scale. The details of how 
this is done vary with the item type. In this project, we experimented 
with a three-dimensional mental rotation item and a hidden-figure item 
type. 
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T he Psychome tries of Three-dimensional Mental Rotation 

An example of this item type is shown in Figure 1. This item type 
was chosen because there exists a large body of literature (cf., 
Corballis, 1982) establishing that an angular disparity between the two 
figures largely determines performance . Moreover , it appears that there 
are fairly- stable and consistent gender differences in performance on 
mental-rotation tasks (Linn and Petersen, 1985). 

-_ The abroach taken wis to examine- the si^lest possible psychometric 
model of an 80 -item test based oh figures such as those in-Figurei^ _ 
(There were eight basic items presented at five angles in their true and 
false version.) The items were presented at angular disparities of 20, 
60, 100, 140, and 180 in order to establish the relationship between 
angular disparity and difficulty. The simplest model that can be fitted 
to these data makes the following predictions: 

- The relationship between difficulty and angular 
disparity is linear. 

- The slope of that relationship is constant at 
different response times. 

- The intercept of the relationship is solely a 
function of response time. 

_____ This model is an extension of the d^ to 
the case in which the response is response time (see Samejima, 1973). 
Thus, to score an exarinee, we simply note the response time to an item 
with a certain angular disparity. Together* the angular disparity and 
response time determine the corresponding difficulty* and they allow us 
to obtain an ability score for this examinee. 



Figure 2 shows the result of a calibration for a typical item based 
on the responses of nearly 200 high school students. As can be seen, 
there-are some departures f rqra- the predictions -although, in general, the 
fit for = this-item is good. --The majqr-deviation-f^ 

at 100-degrees^ - ftlso^- beyond-5 seconds, a- tendency towards = a quadratic 
relationship tetroen difficulty and angular disparity emergeSr_a situa- 
tion which suggests that beyond a certain moment in time different 
strategies come into play. 

Hie results for the false items are quite different, in that angular 
disparity does not seem to control performance as it does for the true 
items. That is, the false items seem to tap the decision aspect of 
performance, while the true items are tapping the mental rotation aspect. 
Figure 3 shows the corresponding data. 

The results of this study are presented in more detail in The 
Psychometr ics of Mental Rotation (RR-86-19). It is concluded that in 
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Figure 2 

Relationship Between Psychometric Difficulty and Angular Disparity 
After 3, 4, 5, 6 and 7 Seeonds for True Version of Item El 
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Figure 3 

Relationship Between Psychometric Difficulty and Angular Disparity 
After 3, 4, 5, 6 and 7 Seconds for False Version of Item El 
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practical applications, the appropriate psychometric model for this item 
type is a two-dimensi onal one. However, in: a cbnpiterized testing 
environment, it may be unnecessary to embellish the psychometric model to 
account for airviliriear relationships between angular disparity arid 
difficulty. Instead, in the : tailoring of the test we chose items for an 
individual in such a way that a response is given within, say* 5 seconds . 
Such a tailoring strategy may have other benefits as well. 



Hidden F igure Items 

Urilike the mentel-rotati on ^ items, for which the determinants of 
performance-are fairly well known^ very little is known about the deter- 
minants of performance in hidden-figure i terns* --Theref ore r = our first task 
was to discover a psycht^tricaliy useful representation- of theitem. 
There were two important constraints on that representation. One was 
that it should provide a description of the item that captures the 
"psychometric essence 11 of the items. Ideally, that representation should 
be psychol ogi cal ly motivated, that is, motivated by previous research on 
the processes and mental models that account for performance on this type 
of cognitive task, unfortunately, for the hidden-figure item, it was not 
possible to locate the relevant research. In addition, the representa- 
tion should lend itself to generating items that had the same underlying 
representation but a different visual realization. For convenience, we 
call the items generated in this fashion clones . Figure 4 shows a pair 
of clones. 

fte-chosen_repres^ consisting of counts indi- 

cating how close the target-figure^appears at each possible position in 
the larger pattern and was bas^ on^^ Hou^ trmsforB^ (Mayhew and 
Frisby, 1984), an artificial intelligence technique used in object 
recognition. We tested the psychometric validity of this representation 
by implementing a computer program capable of generating psychometric 
clones arid then by conparirig their. psychometric characteristics on the 
basis of responses from high school students. 

The item generation algorithm takes the matrix of counts together 
with a small pattern and tries to create a large pattern that matches the 
matrix* -The generation process is simplified by the fact that patterns 
on^ = contain- horizontal^ -vertical , and 45 degree lines between nodes. 
3he_basic idea is-tostart with a large pattern including all the 
possible lines and remove-lines until the matching algorithm produces a 
matrix that equals the input matrix. 



The - results demonstrated that the clones behaved_as_such iri_terms of 

their difficulty as well as distribution of response times. Figure 5_ 

shows the relationship, between the logit for proportion correct arid for 
pairs of clones as well as the corresponding mean response time* Figure 
6 shows the cumulative response times for two clones. It can be seen 
they are very similar, and this was true for the other items as well. 
The results of this experiment appear in more detail in Analysis arid 
Generation of Hidden Figure Items: A Cognitive Approach to Psychometric 
Modeling (RR-86-20). 
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Finiire 5 

Relationship Between Accuracy and Latency for Hidden Fioure Clones 
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Figure 6 



Cummulati've Frequency Distribution of Response Times for Two Clones 
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Summary 

The choice of item types in this study Was hot accidental: they 
were chosen to maximize the chance of a positive demonstration of what; we 
have called "procedural adaptive testing." The essential characteristic 
of procedural adaptive testing is that* unlike "conventional" adaptive 
testing, all the items and their associated item parameter estimates need 
not be stored ahead of time in a database. Instead, through a design 
incorporating the major determinants of performance on that item, data 
are collected to determine the relationship between design and psycho- 
metric parameters. This simple distinction, however, has important 
ramifications. 



----- At a-practicai-ievel^ procedural-adaptive testingis- iikeiy^tobe 
more economical since it avoids the need to calibrate a large number of 
items. This economy may prove advantageous even in paper-and-pehcii 
tests by facilitating thecreafcibh of a priori parallel forms and* in 
general, by better controlling the psychometric characteristics of the 
items that are placed on the test. {In fact, the itene-generatibh program 
developed for the hidden-figure item has been used in the development of 
a Navy pilot test. ) 



However, the most important implication of procedural adaptive 
testing nay not be its practical value but the constraint that it imposes 
on the psychometrician. It is no longer sufficient to gather, calibrate, 
and link items^-as if these tasks were not demanding^ enough. To imple- 
ment aprocedural adaptive testr-It isalso necessary to have a theory of 
item- performance at-aulevel-of specificity ttet^new-items can be -produced 
on-line and under co m p u ter controls These are not trivial requirements, 
especially in verbal domains^ Thus, in attempting to fulfill this 
requirement it will be necessary to gather documentation of psychological 
research related to performance on the item type in question, and if that 
knowledge is not yet available, go ahead and obtain it. This process 
will inevitably lead to a better understanding of test scores. 

Conclusions 

Psychologists, from psychometric and cognitive perspectives, have 
been interested in spatial ability for some time. Psychometricians 
should clearly be credited with the discovery and initial study of 
"spatial abilities » " But it is equally clear _ that cognitive psychol- 
ogists deserve credit -for theunderstanding wehave today about the 
nature of -those-abilities. -Having-a better understanding^ -however , - does 
not_mean that we are more certain about how to measure spatial abilities. 
Just and earj)ehter z ( 1985), for example, concluded -that "item and test 
difficulty may be major determinants of what strategies and processes 
will be evoked in a task. J V By suggesting that item and test difficulty 
are causes, rather than the result of those strategies and processes, 
they seem to suggest that psychometric and psychological models are 
concerned with different phenomena. The alternative view is that not 
only are both models attempting to explain different manifestations of 
the same phenomena, but in addition the parameters of the psychometric 
model ought to be explainable by the psychological theory. 
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---- - Adopting this view creates the potential for measurement instnanents 
that are Both theoretically and psychometrically sound* M though this 
project focused from the start bh the development of more advanced adap- 
tive tests, it seems that even if this had hot been the case the conclu- 
sion about the need for adaptive testing would have been inescapable. 
If, as Just arid Carpenter suggest , different strategies are invoked by 
items of a certain difficulty level, then it appears that a valuable 
contribution of adaptive testing is its preventing the use of different 
strategies by controlling the difficulty of items presented to the 
examinee. The three-dimensional rotation data collected as part of this 
project suggest that different strategies may emerge if an examinee has 
not made a decision after five seconds * In an adaptive test it would be 
relatively^ simple to select items in such a way that the response -would 
be-given-within,-say £ five-seconds.* -This motivation for tailoring does 
not = negate = tte valuabl to choose 

different strategies* Rather, through better. control of what a given 
test measures, we are likely to iiiprbve the precision and validity of 
test outcomes. Indeed, we may be able to detect with more certainty the 
presence of alternative strategies by being able to identify respondents 
that depart from an expected pattern of performance. 

Other Reports 



Be jar, I. I. (1985). Speculations oh the future of test design. In 
Susan Embretson ( Ed . ) , Tggt design: Developments in psychology and 
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